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Abstract. In a Human-Computer Interaction context, we aim to elabo¬ 
rate an adaptive and generic interaction model in two different use cases: 
Embodied Conversational Agents and Creative Musical Agents for mu¬ 
sical improvisation. To reach this goal, we’ll try to use the concepts of 
adaptation and synchronization to enhance the interactive abilities of our 
agents and guide the development of our interaction model, and will try 
to make synchrony emerge from non-verbal dimensions of interaction. 


1 Introduction 

Interaction can be defined as the ensemble of reciprocal actions and responses of 
individuals and groups acting upon each other. It concerns verbal and nonverbal 
communication, implying conscious and nonconscious, enduring and casual pro¬ 
cesses. It can be considered globally as an a continually emerging process [23]. 
This is nowadays a growing topic of interest in the fields of Computer-Human 
Interaction and Social Signal Processing, where the dynamics of interaction are 
used to perform more seamless and believable interaction between human and 
artificial agents. 

In this PhD project, our goal is to develop an interaction model using group 
dynamics. It should be able to take into account the different modalities of 
interaction, especially the non-verbal communication. In particular, we aim at 
designing a model that can enable the emergence of synchronization in the inter¬ 
action. We expect that synchronization processes, which are a form of temporal 
adaptation, could make interaction dynamical in its various temporal dimen¬ 
sions (e.g. a conversation turn, an entire dialogue, repeated interaction, etc.). 
This model will then be applied to two use cases: Creative Musical Agents and 
Embodied Conversational Agents. 

In the following section, we will try to describe the basic concepts of our 
interaction model. We will then present our research questions and directions. 


2 Background 

Synchrony and adaptation Modeling interaction must ideally take into account 
dynamic aspects and should thus be adapt to the evolution of the system. For 
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this reason, we chose to study adaptation and one form of temporal adapta¬ 
tion: synchrony. Adaptation in interaction is the way the system will evolve to 
match the stimuli given by the interaction context m- This adaptation can be 
performed at various level of abstraction and timescales. Delaherche et al. [5] 
propose a definition of synchrony. They describe synchrony as a dynamical and 
reciprocal adaptation of the temporal structure of behaviors between interactive 
partners. It should be at the same time dynamical, because its main features are 
temporality and not actions themselves, multi-modal m. as opposite to sim¬ 
ple imitation or the chameleon effect [7j that involve only one dimension, and 
happening in every interaction context, may it be cooperative or competitive. In 
adults, synchrony have two main roles. First, non-verbal synchrony would ease 
the construction of individual social connections m- Synchrony could also en¬ 
hance cooperation between individuals, especially by augmenting group cohesion 

m- 


Nonverbal behavior and Turn-Taking Nonverbal behavior can be described ”as 
all actions distinct from speech” [TO], although it takes into account paralin- 
guistic aspects of speech like prosody. Nonverbal behavior during communica¬ 
tion can take various forms and expression supports (called modalities), such 
as gaze [14] or paralinguistic signals jHE]. According to Knapp et ah, all hu¬ 
man beings are natural ’’experts” in multimodal communication [TO], i.e. they 
are able to emit and receive simultaneously signals on different modalities know¬ 
ingly or unknowingly. According to Argyle [3] , non-verbal communication fill four 
main functions: Express emotions, send interpersonal attitudes, present oneself’s 
personality and accompany parole. This last notion is essential in turn-taking 
mechanisms. A turn in conversation can be defined as the moment between the 
taking of the floor to the withdrawal of the floor, that can be either consensual 
or forced [TO]. Overlapping turns can indicate a conflict between speakers but 
can also mean a high level of synchrony between interactants as they are able 
to decipher the cues of abandoning of the turn [3]. Turn-Taking mechanisms 
are the way human regulates their conversational interactions and is perceivable 
through signals emitted by both the speakers and their interlocutors. Duncan 
identifies threes types of these signals in conversation m- turn-passing signals, 
signals to keep or try to take the turn from another speaker and backchannels 
that indicate multiple attentions, from the simple acknowledging of one’s last 
utterance to the expression of the mental state of the emitter [2] . These signals 
are essential to conversation and warrant its fluidity [TO] . 

3 Research Questions 

We intend to model synchrony emergence in Turn-Taking behavior in a group of 
agents whether they are taking part into a cooperative or competitive interaction. 
Our model describes an individual and the way it perceives other agents. Each 
agent is able to generate a meaningful output (that from this point we will define 
as conversation for clarity’s sake ) and non-verbal cues through body animation 
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and para-linguistic features. The agent model is based on a Turn-Taking system 
by Ravenet et al. (in press), that we modified. The model in itself can be shown 
in Fig. 



Fig. 1. A general view of our turn-taking model. 


The Turn-Taking system can be modeled as a Finite State Machine A = 

U, So, S,F} where: 

— S = \^Unaddressed,Addressed^WantToSpeak^Speaking^InterruptionO f Speech,EndO f Speech^ 

the conversational states of the agent 

— S the transition matrix 

— So = Unaddressed the initial state 

— S : E X S ^ S the transition function 

— F = {0} the (empty) final states of the FSM 

In this Turn-Taking system, states S describe the current mindset of the 
agents regarding the conversation, which could be unaddressed, addressed, want¬ 
ing the turn, speaking, being interrupted and ending the speech and giving the 
floor to the other participants. Each agent does not know the exact state of 
conversation in which the other agents are but are able to infer it through the 
non-verbal cues, backchannels and speech it perceive ; for instance, in a simple 
dyadic use case, an agent will know it is addressed by another agent if it perceives 
that the other agent is speaking, that it is oriented towards the one agent and 
that the one agent displays cues of attention. Transitions between these states 
are guided by interpersonal attitude [4], modeled through two dimensions: lik¬ 
ing and dominance. Liking can be defined as “a general and enduring positive 
or negative feeling about some person, object or issue ” [501 dominance as 
“the capacity of one agent to affect the behavior of another ” [55] . Interpersonal 
attitude is private to an agent an directed towards another agent. An example 
of state transition could be 


{ Speaking if Means ,d + \Mso,ns,L\ > 0 

Speaking if Counts = 0 

WantToSpeak otherwise 

where: 

— Means,D is the mean of dominance values felt by the agent towards other 
agents speaking at this moment 
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— Means,L is the mean of liking values felt by the agent towards other agents 

speaking at this moment 

— Counts is the number of other agents that are speaking at this moment 

The dominance and the liking felt by an agent towards an agent can evolve 
through time. For instance, an agent interrupting another agent will feel its 
dominance value increase towards this agent, whereas the other agent can see its 
liking value decrease towards the agent that interrupted him, and since we use 
liking to determine the drive from an agent to speak to another, a decrease in 
liking will mean that the other agent will be less inclined to speak with the one 
who interrupted it. These values determining the Turn-Taking behavior of the 
system, we expect these values to converge to a defined close range, adapting to 
the change in the system but keeping it in a stable state and therefore making 
the Turn-Taking behavior synchronize. We intend to verify the existence of this 
synchrony in our system through the usage of automated method such as phase 
synchronization |23] or mntual information m, and also through subjective 
evaluation by naive users to verify that the synchronous behavior observed in the 
agents are still similar to the behaviors taking place in human-human interaction. 
Since the agent have states that are inferred through observation, we intend to 
use Hidden Markov Models to describe our FSM. 

Use cases: Existing Architectures The choice to pick musical improvisation as 
a use case was motivated by the nature of this phenomenon herself: according 
to Borgo et ah, improvisation can be viewed as the ’’synchronization of our 
intentions and our actions, and also the upholding of a connection, a sensibility 
with group dynamics and evolutive experiences” [6]. The OMax System [18] (See 
Fig[^ right) is an automatic improvisation mechanism that rely on the notion 
of stylistic reinjection j^, i.e. a system that extract characteristic elements of 
a musical sequence to devise a model which describe the style of the played 
sequence. After the listening of a musical sequence by a human instrumentalist, it 
can replay a similar sequence presenting stylistically close variations of what have 
been already played thanks to Factor Oracle [1]. Musical interaction between 
the musician and OMax is divided in two phases. In the listening phase, OMaX 
will perceive the musical sequence which will be decomposed note by note and 
stocked in the memory of the system where transitions between non-consecutive 
but similar states will be created thanks to the particular structure of the Factor 
Oracle. In the playing phase, a human operator select the sequences and sub¬ 
sequences of the memory for the system to play. If the operator select transitions 
between non-consecutive states, he/she introduces variety in the sequence though 
respecting the style of the sequences played by the human musician. 

The GRETA-VIB |5T] (See Fig left) system is a virtual embodied char¬ 
acter that uses a modular architecture independent of the agent’s embodiment 
. This architecture follows the SAIBA framework that specifies three modules: 
the intent planner, the behavior planner and the behavior realizer. The modu¬ 
larity is at the center of the GRETA-VIB architecture. In addition to the three 
modules implementing the SAIBA framework, each designer can provide the pro¬ 
gram with independent module attached to these “backbone” modules and that 
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Fig. 2. A general view of the OMAX and GRETA architectures. 


could specify the characteristic of the EGA, notably its behavior, independently 
from the way it is embodied. One of these modules implements a Turn-Taking 
mechanism. The Turn-Taking system is done through a Finite State Machine 
(FSM) that specify the current state of the agent and the different transition 
between states regarding whether the agent is addressed or no and the interper¬ 
sonal social attitude (modeled through the dominance and liking dimensions). 
Each agent has by now the knowledge of the state of all the other agent, but do 
not know either the interpersonal attitude towards him or the other agents or 
the internal variables such as the number of people addressing the agent. 

4 Current and Future work 

We hrst established a literature basis to ground our idea of a synchronous inter¬ 
action model and to apply it to our use cases. We are now looking to implement 
a hrst simple prototype using Hidden Markov Models or their extension called 
Inhuence Models [9]. To guide the emergence of synchrony, we are now looking 
for metrics related to its expression and how to evaluate its occurrence, and 
will be very interested in every input the community can provide us on these 
questions. 
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